Feat/mock #348

FeiDaLI · 2025-10-05T12:26:13Z

What type of PR is this?
feat(llm-katan-server): add lightweight real LLM backend with extended context length. The server forwards OpenAI-compatible requests to a running llm-katan instance and returns responses.

What this PR does / why we need it:

This PR introduces a new LLM Katan Server that serves as a lightweight, real LLM backend alternative to mock-vllm.

Real LLM Backend: Created a FastAPI wrapper around llm-katan that provides actual LLM instead of mock responses
Extended Context Length: Removed the 512 token limit and increased it to 100k tokens to prevent prompt truncation that could lead to information loss
OpenAI-Compatible API: Maintains the same API design as mock-vllm for seamless integration
Updated Documentation: add setup instructions and usage examples

Signed-off-by: FeiDaLI <[email protected]>

netlify · 2025-10-05T12:26:17Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`e034d8e`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68e2640837818900089b6901
😎 Deploy Preview	https://deploy-preview-348--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-10-05T12:26:58Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/llm-katan-server/Dockerfile
tools/llm-katan-server/README.md
tools/llm-katan-server/app.py
tools/llm-katan-server/requirements.txt

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.pre-commit-config.yaml

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/src/lib.rs

📁 `e2e-tests`

Owners: @yossiovadia
Files changed:

e2e-tests/06-pii-detection-test.py

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/training/classifier_model_fine_tuning/ft_linear.py
src/training/dual_classifier/dual_classifier.py
src/training/dual_classifier/trainer.py
src/training/prompt_guard_fine_tuning/jailbreak_bert_finetuning.py
src/training/training_lora/classifier_model_fine_tuning_lora/ft_linear_lora.py
src/training/training_lora/pii_model_fine_tuning_lora/pii_bert_finetuning_lora.py
src/training/training_lora/prompt_guard_fine_tuning_lora/jailbreak_bert_finetuning_lora.py

📁 `website`

Owners: @Xunzhuo
Files changed:

website/docs/installation/installation.md

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-10-05T12:32:34Z

candle-binding/src/lib.rs

        tokenizer
            .with_truncation(Some(TruncationParams {
-                max_length: max_length.unwrap_or(512),
+                max_length: max_length.unwrap_or(100000),


changes to the candle binding go to feat-candle-refactoring

FeiDaLI added 3 commits October 5, 2025 19:47

feat(mock_and_cuda): add cuda support and llm-katan support

211b547

Signed-off-by: FeiDaLI <[email protected]>

feat(mock): llm-katan support

6cb1b55

Signed-off-by: FeiDaLI <[email protected]>

feat(mock): llm-katan support

25290b9

Signed-off-by: FeiDaLI <[email protected]>

FeiDaLI requested review from Xunzhuo, rootfs and wangchen615 as code owners October 5, 2025 12:26

Merge branch 'main' into feat/mock

e034d8e

github-actions bot assigned rootfs, wangchen615, Xunzhuo and yossiovadia Oct 5, 2025

rootfs reviewed Oct 5, 2025

View reviewed changes

FeiDaLI closed this Oct 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/mock #348

Feat/mock #348

Uh oh!

FeiDaLI commented Oct 5, 2025 •

edited

Loading

Uh oh!

netlify bot commented Oct 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

rootfs Oct 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Feat/mock #348

Feat/mock #348

Uh oh!

Conversation

FeiDaLI commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Oct 5, 2025

👥 vLLM Semantic Team Notification

📁 tools

📁 Root Directory

📁 candle-binding

📁 e2e-tests

📁 src

📁 website

🎉 Thanks for your contributions!

Uh oh!

rootfs Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

FeiDaLI commented Oct 5, 2025 •

edited

Loading

netlify bot commented Oct 5, 2025 •

edited

Loading

📁 `tools`

📁 `Root Directory`

📁 `candle-binding`

📁 `e2e-tests`

📁 `src`

📁 `website`